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(54) Multimedia data retrieval device and method 

(57) The multimedia data retrieval device of this 
invention includes: a content storage section for storing 
a plurality of compressed contents; a client terminal for 
inputting feature data; a feature data storage section for 
reading feature data extracted from at least one of the 
compressed contents from the content storage section 
and storing the feature data of the at least one com- 

FIG.1 



pressed contents; and a content retrieval section for 
selecting feature data approximate to the feature data 
input via the client terminal among the feature data 
stored in the feature data storage section, and retrieving 
a content having the selected feature data from the con- 
tent storage section. 
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Description 

BACKGROUND OF THE INVENTION 

1. FIELD OF THE INVENTION: 

[0001] The present invention relates to a multimedia 
data retrieval device located between a server which 
stores a plurality of contents representing images, 
sounds, and the like and a client who desires to retrieve 
content, for searching the contents to retrieve the con- 
tent desired by the client and providing the retrieved 
content to the client, and a retrieval method for such a 
retrieval device. 

2. DESCRIPTION OF THE RELATED ART: 

[0002] A conventional system for searching multime- 
dia contents produces miniature images representing 
outlines of the respective contents. Together with such 
miniature images, data representing the features of the 
contents, such as image size and dominant color infor- 
mation, are created as feature data. Such feature data 
is directly designated to retrieve a content correspond- 
ing to the designated feature data. 
[0003] Figure 17 is a view illustrating the construction 
of a conventional multimedia content retrieval system. 
Referring to Figure 17, multimedia contents are stored 
in a disk 103 mounted on a disk drive 101. The contents 
are read from the disk 103 under control of a file server 
102. transmitted to the client side via a communication 
line 106, and displayed on a display 104 of a computer 
105. 

[0004] The client inputs a feature keyword for a 
desired content, as shown in Figure 18. for facilitating 
retrieval of the desired content. Property data repre- 
senting the features of a plurality of contents stored in 
the disk 103 are stored in advance in the disk 103 in the 
form of a table as shown in Figure 18. The computer 
105 compares the feature keyword input by the client 
with the feature data stored in the disk 103, selects a 
certain number of feature data which are approximate to 
the feature keyword in order of most approximate to less 
approximate, and displays miniature images of contents 
having the selected feature data on the display 104. The 
client selects an appropriate content by referring to the 
displayed miniature images, thereby to obtain the 
desired content. 

[0005] The above retrieval technique is disclosed, for 
example, in U.S. Patent No. 5.761 .655 titled "Image File 
Storage and Retrieval System". 

[0006] The above conventional retrieval technique has 
a disadvantage that, in the case where contents are 
compressed by a coding method before being stored, it 
is necessary to first decompress the compressed con- 
tents to produce non-compressed contents, and create 
feature data based on the non-compressed contents. 
Another disadvantage is that high-speed retrieval is not 



possible if feature data has not been created in 
advance. 

[0007] In the above conventional retrieval technique, 
the client is requested to express a feature of a desired 
5 content by a low-level keyword such as the color, width, 
and height. It is not possible for the client to use high- 
level expression, such as "a scene where a person is 
running in the evening sun", for example, when high- 
level retrieval is desired. 

10 

SUMMARY OF THE INVENTION 

[0008] The multimedia data retrieval device of this 
invention includes: a content storage section for storing 

15 a plurality of compressed contents: a client terminal for 
inputting feature data; a feature data storage section for 
reading feature data extracted from at least one of the 
compressed contents from the content storage section 
and storing the feature data of the at least one com- 

20 pressed contents: and a content retrieval section for 
selecting feature data approximate to the feature data 
input via the client terminal among the feature data 
stored in the feature data storage section, and retrieving 
a content having the selected feature data from the con- 

25 tent storage section. 

[0009] In one embodiment of the invention, each of 
the compressed contents includes a plurality of macro 
blocks representing an image shape, the image shape 
represented by the macro blocks is converted into a 

30 value consisting of at least one bit. and the bit is used as 
feature data of a shape represented by the content. 
[0010] In another embodiment of the invention, each 
of the compressed contents includes mesh-coded data 
representing an image shape, and the mesh-coded 

35 data is used as feature data of a shape represented by 
the content. 

[0011] In still another embodiment of the invention, 
each of the compressed contents includes a plurality of 
macro blocks representing an image shape, an average 

40 of DC components of a luminance component (Y) and a 
DC component of each of chrominance components 
(Pb, Pr) are obtained for each macro block, and the 
average and the DC components are used as feature 
data of color information and brightness information 

45 represented by the content. 

[0012] In still another embodiment of the invention, 
each of the compressed contents includes a plurality of 
macro blocks representing an image shape, motions of 
an object represented by macro block motion informa- 

50 tion are read to obtain an average of the motions of the 
object, and the average is used as feature data of 
motion information of the object represented by the con- 
tent. 

[0013] In still another embodiment of the invention, 
55 each of the compressed contents includes a plurality of 
macro blocks representing an image shape. DC compo- 
nents and AC components of a luminance component 
and DC components and AC components of a chromi- 
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nance component of an object represented by the 
macro blocks are read, and averages of the respective 
components are obtained and used as feature data of 
texture information of the object represented by the con- 
tent. 

[0014] In still another embodiment of the invention, 
each of the compressed contents includes frames rep- 
resenting sound. LPC coefficients recorded for each 
frame are read, and an average of the LPC coefficients 
is obtained and used as feature data of tone information 
represented by the multimedia content. 
[0015] In still another embodiment of the invention, 
each of the compressed contents includes frames rep- 
resenting sound, spectrum normalization coefficients 
recorded for each frame are read, and an average of the 
spectrum normalization coefficients is obtained for each 
predetermined time period and used as feature data of 
tone information. 

[0016] In still another embodiment of the invention, 
each of the compressed contents includes frames rep- 
resenting sound, a prediction residual recorded for each 
frame is read, and the prediction residual is used as fea- 
ture data of rhythm information. 

[0017] In still another embodiment of the invention, 
each of the compressed contents includes frames rep- 
resenting sound, a frequency component after spec- 
trum normalization performed for each frame is read, 
and the frequency component is used as feature data of 
rhythm information. 

[0018] In still another embodiment of the invention, 
each of the compressed contents includes frames rep- 
resenting sound, LPC coefficients recorded for each 
frame are read, and a temporal change of the LPC coef- 
ficients is used as feature data of melody information. 
[0019] In still another embodiment of the invention, 
each of the compressed contents includes frames rep- 
resenting sound, spectrum normalization coefficients 
recorded for each frame are read, and a temporal 
change of the spectrum normalization coefficients is 
used as feature data of metody information. 
[0020] In still another embodiment of the invention, 
each of the compressed contents includes a plurality of 
objects, an object description recorded for each object 
is read, and a frequency of appearance of a word, as 
well as a frequency of appearance of a combination of a 
word and a preceding or following word, used in the 
object description are used as feature data of word 
information. 

[0021] According to another aspect of the invention, a 
multimedia data retrieval method is provided. The 
method includes the steps oft storing a plurality of com- 
pressed contents; inputting feature data via a client ter- 
minal; reading feature data extracted from the 
compressed contents and storing the feature data of the 
compressed contents; and selecting feature data 
approximate to the feature data input via the client ter- 
minal among the stored feature data, and retrieving a 
content having the selected feature data from the stored 



contents. 

[0022] Alternatively, the multimedia data retrieval 
device of this invention includes: a content storage sec- 
tion for storing a plurality of contents; a client terminal 

5 for inputting a feature description text: a feature data 
storage section for reading feature data of the contents 
from the content storage section and storing the feature 
data of the contents; and a content retrieval section for 
extracting a keyword from the feature description text 

w input via the client terminal, converting the keyword into 
feature data, selecting feature data approximate to the 
feature data of the keyword among the feature data 
stored in the feature data storage section, and retrieving 
a content having the selected feature data from the con- 

15 tent storage section. 

[0023] In one embodiment of the invention, the con- 
tent retrieval section includes a keyword dictionary for 
converting a keyword into feature data, and the keyword 
extracted from the feature description text is converted 

20 into the feature data using the keyword dictionary. 

[0024] In another embodiment of the invention, the 
content retrieval section extracts a major part of speech 
from the feature description text to be used as a key- 
word. 

25 [0025] In still another embodiment of the invention, the 
content retrieval section uses shape information of a 
content as the feature data. 

[0026] In still another embodiment of the invention, the 
content retrieval section uses color information and 
30 brightness information of a content as the feature data. 
[0027] In still another embodiment of the invention, the 
content retrieval section uses motion information of a 
content as the feature data. 

[0028] In still another embodiment of the invention, the 
35 content retrieval section uses texture information of a 
compressed content as the feature data. 
[0029] Alternatively, the multimedia data retrieval 
method of this invention includes the steps of: storing a 
plurality of contents; inputting a feature description text 
40 via a client terminal; reading feature data of the contents 
and storing the feature data; and extracting a keyword 
from the feature description text input via the client ter- 
minal, converting the keyword into feature data, select- 
ing feature data approximate to the feature data of the 
45 keyword among the stored feature data, and retrieving a 
content having the selected feature data from the stored 
contents. 

[0030] Thus, the invention described herein makes 
possible the advantages of (1) providing a multimedia 

50 data retrieval device capable of retrieving a content at 
high speed using high-level expression, ^nd (2) provid- 
ing a retrieval method for such a device 
[0031] These and other advantages ; the present 
invention will become apparent to those * - - i in the art 

55 upon reading and understanding the f.. .*. detailed 
description with reference to the accon ; • . -*q figures. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
[0032] 

Figure 1 is a view illustrating a construction of a 
multimedia content retrieval device of Example 1 
according to the present invention; 

Figure 2 is a view showing items of feature data of 
an object stored in a feature data memory in Exam- 
ple 1; 

Figure 3 is a view illustrating a data structure of a 
compressed content in Example 1; 

Figure 4 is a view illustrating a processing of 
extracting the shape of an object as feature data; 

Figure 5 is a view illustrating an alternative process- 
ing of extracting the shape of an object as feature 
data; 

Figure 6 is a view illustrating a processing of 
extracting the brightness of an object as feature 
data; 

Figure 7 is a view illustrating a processing of 
extracting the color of an object as feature data; 

Figure 8 is a view illustrating a processing of 
extracting motion information of an object as fea- 
ture data; 

Figure 9 is a view illustrating a processing of 
extracting texture information of an object as fea- 
ture data; 

Figure 10 is a view illustrating a data structure of a 
compressed audio content in Example 1; 

Figure 11 is a view illustrating a data structure of a 
compressed audio content in Example 1; 

Figure 12 is a view illustrating a data structure of a 
compressed multimedia content in Example 1; 

Figure 13 is a view illustrating a construction of a 
multimedia content retrieval device of Example 2 
according to the present invention; 

Figure 14 is a data table stored in a feature data 
memory in Example 2; 

Figure 15 is a view illustrating a construction of a 
content retrieval section in Example 2 in more 
detail; 

Figure 16 is a data table stored in a keyword dic- 



tionary in Example 2; 

Figure 17 is a view illustrating a construction of a 
conventional multimedia content retrieval system; 
5 and 

Figure 18 is a data table in the conventional system. 

DESCRIPTION OF THE PREFERRED EMBODI- 
w MENTS 

(Example 1) 

[0033] Figure 1 is a view illustrating the construction 

15 of a multimedia content retrieval device of Example 1 
according to the present invention. Referring to Figure 
1, a multimedia content retrieval device 10 includes a 
content storage section 1 . a content retrieval section 2, 
a client terminal 3. and communication lines 41 and 42. 

20 The content storage section 1 stores a plurality of coded 
compressed contents representing images, sounds, 
and the like. The content retrieval section 2 accesses 
the content storage section 1 for retrieving a content. 
The client terminal 3 requests the content retrieval sec- 

25 tion 2 to retrieve a content. Normally, a plurality of con- 
tent storage sections 1 and a plurality of client terminals 
3 are connected via communication lines so that each of 
the client terminals 3 can access any of the content 
storage sections 1 via the content retrieval section 2. 

30 [0034] The content storage section 1 includes a file 
server 12 and a disk drive 13. The disk drive 13 records 
and reproduces a plurality of coded compressed con- 
tents on and from a disk 11. The file server 12 controls 
the disk drive 13 to control the recording and reproduc- 

35 tion of contents on and from the disk 11. and performs 
data communication with external terminals via the 
communication line 41. 

[0035] The content retrieval section 2 includes a fea- 
ture data memory 21. a feature extraction/retrieval 

40 engine 22. and a data conversion portion 23. The fea- 
ture extraction/retrieval engine 22 accesses the content 
storage section 1 via the communication line 41. so as 
to extract feature data from a plurality of objects 
included in each of a plurality of contents stored in the 

45 disk 11 for each content, and store the extracted feature 
data of the objects in the feature data memory 21. The 
data conversion portion 23 receives data from the client 
terminal 3 via the communication line 42 and converts 
the received data into feature data. 

50 [0036] Figure 2 shows exemplary items of feature data 
of objects to be stored in the feature data memory 21. 
The exemplary items of feature data include the shape, 
color and brightness, motion, texture, tone, rhythm mel- 
ody, word, and the like of an object. Figu'~ 2 merely 

55 shows the meaning of the respective item- * '-Mure 
data, which should be expressed in their r- ; - • for- 
mats. At least one of the items of fe.v : it* is 
selected in accordance with the type of th-r : - • \-> be 



4 



7 



EP 0 971 296 A2 



8 



used as feature data of the object. 
[0037] The client terminal 3 includes a computer, a 
keyboard, a memory, a display, and the like. Upon 
receipt of data by the client's operation of the keyboard 
and the like, the client terminal 3 transmits the data to 
the data conversion portion 23 of the content retrieval 
section 2 via the communication line 42. The data con- 
version portion 23 converts the data into the same for- 
mat as that of the feature data stored in the feature data 
memory 21. and transfers the resultant feature data to 
the feature extraction/retrieval engine 22. The feature 
extraction/retrieval engine 22 searches the feature data 
memory 21 to select feature data which is most approx- 
imate to the transferred feature data and thus an object 
having the selected feature data, so as to determine a 
content including the object. The feature extrac- 
tion/retrieval engine 22 instructs the file server 12 of the 
content storage section 1 to retrieve the determined 
content via the communication line 41 . The file server 
12 reads the content from the disk 1 1 . and supplies the 
content to the client terminal 3 via the content retrieval 
section 2. The client terminal 3 displays, reproduces, or 
records the retrieved content. 

[0038] Figure 3 is a view of the data structure of a 
compressed content in this example. The content in this 
example is a multimedia content representing an image 
shape, sound, and the like, which is compressed by a 
compression coding method such as MPEG. When the 
content represents an image, the data structure of the 
content includes a header which contains information 
such as the size and compression method of the image, 
the bit rate at which the data is read and the frame rate 
at which the data is displayed after decompression, and 
the amount of data to be read at one time. 
[0039] In the MPEG method, each frame of an image 
is subjected to discrete cosine transform (DCT) for each 
block of 8 x 8 pixels as a unit. Coefficients obtained by 
the DCT are coded sequentially into variable-length 
codes which are arranged in order from a DC compo- 
nent to an AC higher-frequency component. In the case 
of a color image, four adjacent blocks are used, to 
obtain four blocks indicating a luminance component (Y) 
and each one block indicating chrominance compo- 
nents (Pb, Pr), which are sequentially arranged in the 
data structure and catted a macro block as a unit. The 
macro block may be subjected to motion-compensated 
prediction coding so that a motion between frames can 
be compensated. In this case, data on a motion vector 
used for the motion compensation is inserted at the 
head of each macro block in the data structure. 
[0040] In an MPEG4 compression method, an image 
in a frame is divided into a layer representing a person, 
for example, in the foreground and a layer representing 
a mountain and the like, for example, in the background. 
A significant portion of the image of each layer is called 
an object, and only macro blocks corresponding to the 
significant portion are recorded. The shape of the object 
is discernible from the transparency of pixels in a region 



including the object. Such shape data is coded for each 
macro block and inserted at a position preceding the 
motion vector data. 

[0041] The details of the MPEG4 are described in 
5 ISO/IEC 14496-1,-2. -3, Final Committee Draft of Inter- 
national Standard, May 1998. 

[0042] Hereinbelow. the method for extracting feature 
data from a content compressed by the above MPEG 
method and retrieving a content using feature data will 

w be described in detail. 

[0043] In the case of extracting the shape of an object 
as feature data, the feature extraction/retrieval engine 
22 of the content retrieval section 2 scans compressed 
contents stored in the disk 11 of the content storage 

15 section 1 sequentially, to read the shape of the object 
represented in corresponding macro blocks. At the 
same time t the feature extraction/retrieval engine 22 
secures a memory region composed of the same 
number of bits as the number of macro blocks in one 

20 frame in the feature data memory 21 for storing feature 
data. For example, referring to Figure 4, if all pixels in a 
macro block MB indicate "0" (transparent), a bit CB of 
feature data corresponding to this macro block MB is 
set at "0". Likewise, if a macro block MB includes a pixel 

25 indicating "1" (opaque), i.e., if a macro block MB repre- 
sents an object, a bit CB of the feature data correspond- 
ing to this macro block MB is set at "1". In this way, 
feature data indicating the shape of the object is 
obtained. Thus, the shapes of objects are extracted for 

30 all multimedia contents stored in the disk 11, and fea- 
ture data indicating the shapes of the objects are 
sequentially stored in the feature data memory 21. 
[0044] When, the client attempts to retrieve a content 
containing an object of a desired shape, the client inputs 

35 data indicating the desired shape to the client terminal 
3. The client terminal 3 transmits the data indicating the 
shape to the data conversion portion 23 of the content 
retrieval section 2. The data indicating the shape may 
be hand-written data or data in the same format as that 

40 of the feature data stored in the feature data memory 
21. If the data transmitted from the client terminal 3 is 
hand-written data, the data conversion portion 23 dis- 
cerns the shape indicated by the data, converts the dis- 
cerned shape into feature data, and transfers the 

45 converted feature data to the feature extraction/retrieval 
engine 22. If the data transmitted from the client termi- 
nal 3 is in the same format as that of the feature data 
stored in the feature data memory 21. the data conver- 
sion portion 23 transfers the feature data to the feature 

50 extraction/retrieval engine 22. The feature extrac- 
tion/retrieval engine 22 searches the feature data mem- 
ory 21 to select an object having feature data which is 
most approximate to the feature data transmitted from 
the client terminal 3, so as to determine j content 

55 including the object. The feature extr i .* »n/retrieval 
engine 22 instructs the file server 12 of "ttent stor- 
age section 1 to retrieve the determine : "tent. The 
file server 12 reads the content from : 11. and 
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supplies the content to the client terminal 3 via the con- 
tent retrieval section 2. 

[0045] The method for obtaining the most approximate 
feature data is as follows. Respective bits of the feature 
data transmitted from the client terminal 3 are com- 
pared with corresponding bits of feature data stored in 
the feature data memory 21. to obtain absolutes of the 
differences between the corresponding two bit values 
for all bits and then calculate the sum of the absolutes 
(difference). This calculation is performed for al! the 
shape feature data stored in the feature data memory 
21, and feature data which provides the smallest sum is 
designated as the most approximate feature data. 
[0046] In the case where the shape of a content is 
indicated by the coordinates of vertexes of a computer 
graphic wire-frame model as shown in Figure 5 and the 
content is compressed by the mesh coding method of 
MPEG4. the feature extraction/retrieval engine 22 of the 
content retrieval section 2 extracts mesh-coded data of 
all compressed contents stored in the disk 11 of the 
content storage section 1 as respective feature data, 
and stores the extracted feature data in the feature data 
memory 21 . 

[0047] Upon receipt of mesh-coded data from the cli- 
ent terminal 3 as feature data, the feature extrac- 
tion/retrieval engine 22 searches the feature data 
memory 21 to select feature data having the smallest 
difference from the feature data transmitted from the cli- 
ent terminal 3 and determine a content having the 
selected feature data. The feature extraction/retrieval 
engine 22 instructs the file server 12 of the content stor- 
age section 1 to retrieve the determined content. The 
file server 12 reads the content from the disk 11, and 
supplies the content to the client terminal 3 via the con- 
tent retrieval section 2. 

[0048] In the case of extracting the color and bright- 
ness of an object as feature data, the feature extrac- 
tion/retrieval engine 22 of the content retrieval section 2 
sequentially scans compressed contents stored in the 
disk 11 of the content storage section 1 to read the DC 
components of luminance components (Y), as well as 
DC components of chrominance components (Pb, Pr), 
for respective macro blocks sequentially. At the same 
time, the feature extraction/retrieval engine 22 secures 
a memory region composed of the number of bits three 
times as large as the number of macro blocks in an 
object in the feature data memory 21 for storing feature 
data. Then, as shown in Figure 6, for example, averages 
of DC components of respective luminance compo- 
nents (Y) for respective macro blocks MB are calcu- 
lated, and the resultant averages of the macro blocks 
MB are stored in the feature data memory 21 as feature 
data. Also, as shown in Figure 7. for example. DC com- 
ponents of respective chrominance components (Pb, 
Pr) for respective macro blocks MB are obtained as fea- 
ture data and stored in the feature data memory 21. 
Thus, information on brightness and color is obtained as 
feature data of an object. In this way, color and bright- 



ness information is extracted for all multimedia contents 
stored in the disk 11. and sequentially stored in the fea- 
ture data memory 21. 

[0049] When the client attempts to retrieve a content 

5 including an object of desired brightness and color, the 
client inputs data indicating the desired brightness and 
color to the client terminal 3. The client terminal 3 trans- 
mits the data indicating the brightness and color to the 
data conversion portion 23 of the content retrieval sec- 

10 tion 2. The data indicating the brightness and color may 
be hand-written data or data in the same format as that 
of the feature data stored in the feature data memory 
21. If the data transmitted from the client terminal 3 is 
hand-written data, the data conversion portion 23 dis- 

15 cerns the brightness and color indicated by the data, 
converts the discerned brightness and color into feature 
data, and transfers the converted feature data to the 
feature extraction/retrieval engine 22. If the data trans- 
mitted from the client terminal 3 is in the same format as 

20 that of the feature data stored in the feature data mem- 
ory 21, the data conversion portion 23 transfers the fea- 
ture data to the feature extraction/retrieval engine 22. 
The feature extraction/retrieval engine 22 searches the 
feature data memory 21 to select an object having fea- 

25 ture data most approximate to the feature data indicat- 
ing brightness and color transmitted from the client 
terminal 3 to determine a content including the object. 
The feature extraction/retrieval engine 22 instructs the 
file server 1 2 of the content storage section 1 to retrieve 

30 the determined content. The file server 12 reads the 
content from the disk 11, and supplies the content to the 
client terminal 3 via the content retrieval section 2 
[0050] The method for obtaining the most approximate 
feature data is as follows. Respective bits of the feature 

35 data transmitted from the client terminal 3 are com- 
pared with corresponding bits of feature data stored in 
the feature data memory 21. to obtain absolutes of the 
differences between the corresponding two bit values 
for all bits and then calculate the sum of the absolutes. 

40 This calculation is performed for all the feature data 
stored in the feature data memory 21, and feature data 
which provides the smallest sum is designated as the 
most approximate feature data. 

[0051] In the case of extracting the motion of an object 
45 as feature data, the feature extraction/retrieval engine 
22 of the content retrieval section 2 sequentially scans 
compressed contents stored in the disk 11 of the con- 
tent storage section 1, to read the values of the motion 
of an object for respective macro blocks MB and then 
so calculate averages of the values, as shown in Figure 8. 
so as to store the temporally changing averages in the 
feature data memory 21 as feature data of motion infor- 
mation. 

[0052] When the client attempts to retrieve -. content 
55 containing an object of desired motion, the cii-n: inputs 
data indicating the desired motion to the c1i~~* -nninal 
3. The client terminal 3 transmits the data m : r ng the 
motion to the data conversion portion 23 of :• - .ntent 
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retrieval section 2. The data indicating the motion may 
be hand-written data or data in the same format as that 
of the feature data stored in the feature data memory 
21. If the data transmitted from the client terminal 3 is 
hand-written data, the data conversion portion 23 dis- 
cerns the motion indicated by the data, converts the dis- 
cerned motion into feature data, and transfers the 
converted feature data to the feature extraction/retrieval 
engine 22. If the data transmitted from the client termi- 
nal 3 is in the same format as that of the feature data 
stored in the feature data memory 21 . the data conver- 
sion portion 23 transfers the feature data to the feature 
extraction/retrieval engine 22. The feature extrac- 
tion/retrieval engine 22 searches the feature data mem- 
ory 21 to select an object having feature data most 
approximate to the feature data indicating motion trans- 
mitted from the client terminal 3 and determines a con- 
tent containing the object. The feature 
extraction/retrieval engine 22 instructs the file server 12 
of the content storage section 1 to retrieve the content. 
The file server 12 reads the content from the disk 11, 
and supplies the content to the client terminal 3 via the 
content retrieval section 2. 

[0053] The method for obtaining the most approximate 
feature data is as follows. Respective bits of the feature 
data transmitted from the client terminal 3 are com- 
pared with corresponding bits of feature data stored in 
the feature data memory 21. to obtain absolutes of the 
differences between the corresponding two bit values 
for all bits and then calculate the sum of the absolutes. 
This calculation of the sum of absolutes is performed for 
all the feature data stored in the feature data memory 
21. and feature data which provides the smallest sum is 
designated as the most approximate feature data. 
[0054] In the case of extracting texture information of 
an object as feature data, the feature extraction/retrieval 
engine 22 of the content retrieval section 2 scans com- 
pressed contents stored in the disk 11 of the content 
storage section 1 sequentially to read DC components 
and AC components of luminance components, as well 
as DC components and AC components of chromi- 
nance components, for respective macro blocks, as 
shown in Figure 9. to obtain an average of the DC com- 
ponents and an average of the AC components of lumi- 
nance components, as well as an average of the DC 
components and an average of the AC components of 
chrominance components, for the entire object. The 
resultant averages are stored in the feature data mem- 
ory 21 as feature data of the texture information. In this 
way, texture information is extracted for all multimedia 
contents stored in the disk 1 1 , and sequentially stored in 
the feature data memory 21. 

[0055] When the client attempts to retrieve a content 
containing an object having a desired texture, the client 
inputs data indicating the desired texture to the client 
terminal 3. The client terminal 3 transfers the data indi- 
cating the texture to the data conversion portion 23 of 
the content retrieval section 2. The data indicating the 



texture may be hand-written data or data in the same 
format as that of the feature data stored in the feature 
data memory 21. If the data transmitted from the client 
terminal 3 is hand-written data, the data conversion por- 

5 tion 23 discerns the texture indicated by the data, con- 
verts the discerned texture into feature data, and 
transfers the converted feature data to the feature 
extraction/retrieval engine 22. If the data transmitted 
from the client terminal 3 is in the same format as that of 

10 the feature data stored in the feature data memory 21, 
the data conversion portion 23 transfers the feature data 
to the feature extraction/retrieval engine 22. The feature 
extraction/retrieval engine 22 searches the feature data 
memory 21 to select an object having feature data most 

? 5 approximate to the feature data indicating the texture 
transmitted from the client terminal 3 and determine a 
content containing the object. The feature extrac- 
tion/retrieval engine 22 instructs the file server 12 of the 
content storage section 1 to retrieve the content. The file 

20 server 12 reads the content from the disk 11, and sup- 
plies the content to the client terminal 3 via the content 
retrieval section 2. 

[0056] The method for obtaining the most approximate 
feature data is as follows. Respective bits of the feature 

25 data transmitted from the client terminal 3 are com- 
pared with corresponding bits of feature data stored in 
the feature data memory 21, to obtain absolutes of the 
differences between the corresponding two bit values 
for all bits and then calculate the sum of the absolutes. 

30 This calculation is performed for all the feature data 
stored in the feature data memory 21. and feature data 
which provides the smallest sum is designated as the 
most approximate feature data. 

[0057] Figure 10 is a view of data structure of a com- 

35 pressed audio content in this example. An audio data 
structure includes a header which contains information 
such as the length and compression method of a sound, 
the bit rate at which the data is read and the velocity at 
which the data is reproduced after decompression, and 

40 the amount of data (frame) to be read at one time. In 
code excited linear prediction (CELP) audio coding of 
MPEG4. a prediction coefficient obtained when sound is 
predicted by linear prediction coding (LPC- is coded as 
tone information. A prediction error is separately coded 

45 as sound source information (amplitude information), 
and arranged with the tone information in pairs at prede- 
termined time intervals (for each frame) 
[0058] A method for extracting feature data from data 
of a compressed audio content with the ab -e construc- 

50 tion will be described. 

[0059] In the case of extracting tone mf. -nation of an 
object as feature data, the content ret- >' section 2 
scans compressed contents stored in tr-~ : - < 11 of the 
content storage section 1 sequentially t ■ = : _PC coef- 

55 ficients of each of the contents for eac * to obtain 
an average of the LPC coefficients foi ime and 

store the average in the feature data n •■ . 21 js fea- 
ture data of the tone information. In th - . • n- mfor- 
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mation is extracted for all multimedia contents stored in 
the disk 11, and sequentially stored in the feature data 
memory 21. 

[0060] When the client attempts to retrieve a content 
containing an object having desired tone, the client 
inputs data indicating the desired tone to the client ter- 
minal 3. The client terminal 3 transmits the data indicat- 
ing the tone to the data conversion portion 23 of the 
content retrieval section 2. The data indicating the tone 
may be data indicating a hummed tone or data in the 
same format as that of the feature data stored in the fea- 
ture data memory 21. If the data transmitted from the 
client terminal 3 is data indicating the tone, the data 
conversion portion 23 converts the data indicating the 
tone into feature data, and transfers the converted fea- 
ture data to the feature extraction/retrieval engine 22. If 
the data transmitted from the client terminal 3 is in the 
same format as that of the feature data stored in the fea- 
ture data memory 21, the data conversion portion 23 
transfers the feature data to the feature extrac- 
tion/retrieval engine 22. The feature extraction/retrieval 
engine 22 searches the feature data memory 21 to 
select an object having feature data most approximate 
to the feature data indicating the tone transmitted from 
the client terminal 3 and determine a content containing 
the object. The feature extraction/retrieval engine 22 
instructs the file server 1 2 of the content storage section 
1 to retrieve the determined content. The file server 12 
reads the content from the disk 11, and supplies the 
content to the client terminal 3 via the content retrieval 
section 2. 

[0061] The method for obtaining the most approximate 
feature data is as follows. Averages of LPC coefficients 
of respective frames as feature data transmitted from 
the client terminal 3 are compared with averages of LPC 
coefficients of respective frames as feature data stored 
in the feature data memory 21, to obtain absolutes of 
the differences between the corresponding two aver- 
ages and then calculate the sum of the absolutes. This 
calculation is performed for all the feature data stored in 
the feature data memory 21. and feature data which 
provides the smallest sum (difference) is designated as 
the most approximate feature data. 
[0062] In the case of extracting rhythm information of 
an object as feature data, the content retrieval section 2 
scans compressed contents stored in the disk 11 of the 
content storage section 1 sequentially to read a predic- 
tion residual value (change in amplitude) of each of the 
contents for each frame and store the value in the fea- 
ture data memory 21 as feature data of the rhythm infor- 
mation. In this way, rhythm information is extracted for 
all multimedia contents stored in the disk 11. and 
sequentially stored in the feature data memory 21 . 
[0063] When the client attempts to retrieve a content 
containing an object having desired rhythm, the client 
inputs rhythm information to the client terminal 3. The 
client terminal 3 transmits the rhythm information to the 
data conversion portion 23 of the content retrieval sec- 



tion 2. The rhythm information may be data indicating a 
hummed rhythm or data in the same format as that of 
the feature data stored in the feature data memory 21. If 
the data transmitted from the client terminal 3 is data 

5 indicating a rhythm, the data conversion portion 23 con- 
verts the data indicating the rhythm into feature data, 
and transfers the converted feature data to the feature 
extraction/retrieval engine 22. If the data transmitted 
from the client terminal 3 is in the same format as that of 

;o the feature data stored in the feature data memory 21, 
the data conversion portion 23 transfers the feature data 
to the feature extraction/retrieval engine 22. The feature 
extraction/retrieval engine 22 searches the feature data 
memory 21 to select an object having feature data most 

75 approximate to the feature data indicating the rhythm 
transmitted from the client terminal 3 and determines a 
content containing the object. The feature extrac- 
tion/retrieval engine 22 instructs the file server 1 2 of the 
content storage section 1 to retrieve the content. The file 

20 server 12 reads the content from the disk 11, and sup- 
plies the content to the client terminal 3 via the content 
retrieval section 2. 

[0064] The method for obtaining the most approximate 
feature data is as follows. Prediction residual values 

25 (changes in amplitude) for respective frames as feature 
data transmitted from the client terminal 3 are com- 
pared with prediction residual values (changes in ampli- 
tude) for respective frames as feature data stored in the 
feature data memory 21 , to obtain absolutes of the dif- 

30 ferences between the corresponding two values and 
then calculate the sum of the absolutes. This calculation 
is performed for all the feature data stored in the feature 
data memory 21, and feature data which provides the 
smallest sum (difference) is designated as the most 

35 approximate feature data. 

[0065] In the case of extracting melody information of 
an object as feature data, the content retrieval section 2 
scans compressed contents stored in the disk 11 of the 
content storage section 1 sequentially to read LPC coef- 

40 ficients of the contents for each frame, thereby to obtain 
temporal changes of the LPC coefficients for respective 
frames and store the temporal changes in the feature 
data memory 21 as feature data of the melody informa- 
tion. In this way, melody information is extracted for all 

45 multimedia contents stored in the disk 11. and sequen- 
tially stored in the feature data memory 21 
[0066] When the client attempts to retrieve a content 
containing an object having a desired melody the client 
inputs data indicating the melody to the client terminal 

so 3. The client terminal 3 transmits the data indicting the 
melody to the data conversion portion 23 of content 
retrieval section 2. The data indicating a m~i-- may be 
data indicating a hummed melody or data '* same 
format as that of the feature data stored - • iture 

55 data memory 21. If the data transmitted f- ■ - sent 
terminal 3 is data indicating a melody. th~ : * -ver- 
sion portion 23 converts the data indica* ■ : ■ - -Jy 
into feature data, and transfers the con*. ••■ ••- '*ure 



8 



15 



EP 0 971 296 A2 



16 



data to the feature extraction/retrieval engine 22. If the 
data transmitted from the client terminal 3 is in the same 
format as that of the feature data stored in the feature 
data memory 21, the data conversion portion 23 trans- 
fers the feature data to the feature extraction/retrieval 
engine 22. The feature extraction/retrieval engine 22 
searches the feature data memory 21 to select an 
object having feature data most approximate to the fea- 
ture data indicating the melody transmitted from the cli- 
ent terminal 3 and determine a content containing the 
object. The feature extraction/retrieval engine 22 
instructs the file server 12 of the content storage section 
1 to retrieve the determined content. The file server 12 
reads the content from the disk 11, and supplies the 
content to the client terminal 3 via the content retrieval 
section 2. 

[0067] The method for obtaining the most approximate 
feature data is as follows. The temporal changes of the 
LPC coefficients for respective frames as feature data 
transmitted from the client terminal 3 are compared with 
the temporal changes of the LPC coefficients for 
respective frames as feature data stored in the feature 
data memory 21, to obtain absolutes of the differences 
between the corresponding two values and then calcu- 
late the sum of the absolutes. This calculation is per- 
formed for all the feature data stored in the feature data 
memory 21. and feature data which provides the small- 
est sum (difference) is designated as the most approxi- 
mate feature data. 

[0068] Figure 11 illustrates a data structure of a com- 
pressed audio content in this example. An audio data 
structure includes a header which contains information 
such as the sampling frequency and compression 
method of an audio signal, the bit rate at which the data 
is read and the velocity at which the data is reproduced 
after decompression, and the amount of data (frame) to 
be read at one time. In time/frequency conversion cod- 
ing of MPEG4. a frequency spectrum of an audio signal 
is analyzed by frequency analysis and the like to extract 
the spectral envelope value. The extracted value is 
coded as a spectrum normalization coefficient. The 
extracted value is also used to normalize a frequency 
component. More specifically, a frequency component 
is obtained by performing Modified discrete cosine 
transform (MDCT) for the audio signal, and divided by 
this extracted value to normalize the amplitude of the 
frequency component. The temporal redundancy of the 
normalized frequency component is reduced by predic- 
tion coding, and the redundancy thereof between chan- 
nels is reduced by prediction coding between channels. 
The thus-processed frequency component is quantized 
and variable-length coded, and the resultant value is 
arranged sequentially, together with the spectrum nor- 
malization coefficient, for each time period (frame). 
[0069] A method for extracting feature data from a 
compressed audio content with the above construction 
will be described. 

[0070] In the case of extracting tone information of an 



object as feature data, the content retrieval section 2 
scans compressed contents stored in the disk 11 of the 
content storage section 1 sequentially to read spectrum 
normalization coefficients of each of the contents for 

5 each frame, thereby to obtain averages of the spectrum 
normalization coefficients for each predetermined time 
period and store the averages in the feature data mem- 
ory 21 as feature data of the tone information. In this 
way, tone information is extracted for all multimedia con- 

10 tents stored in the disk 1 1 . and sequentially stored in the 
feature data memory 21. 

[0071] When the client attempts to retrieve a content 
containing an object having desired tone, the client 
inputs data indicating the desired tone to the client ter- 

15 minal 3. The client terminal 3 transmits the data indicat- 
ing the tone to the data conversion portion 23 of the 
content retrieval section 2. The data indicating the tone 
may be data indicating a hummed tone or data in the 
same format as that of the feature data stored in the fea- 

20 ture data memory 21. If the data transmitted from the 
client terminal 3 is data indicating a tone, the data con- 
version portion 23 converts the data indicating a tone 
into feature data, and transfers the converted feature 
data to the feature extraction/retrieval engine 22. If the 

25 data transmitted from the client terminal 3 is in the same 
format as that of the feature data stored in the feature 
data memory 21 , the data conversion portion 23 trans- 
fers the feature data to the feature extraction/retrieval 
engine 22. The feature extraction/retrieval engine 22 

30 searches the feature data memory 21 to select an 
object having feature data most approximate to the fea- 
ture data indicating the tone transmitted from the client 
terminal 3 and determine a content containing the 
object. The feature extraction/retrieval engine 22 

35 instructs the file server 12 of the content storage section 
1 to retrieve the determined content. The file server 12 
reads the content from the disk 11. and supplies the 
content to the client terminal 3 via the content retrieval 
section 2. 

40 [0072] The method for obtaining the most approximate 
feature data is as follows. Averages of spectrum normal- 
ization coefficients for each predetermined time period 
as feature data transmitted from the client terminal 3 are 
compared with averages of spectrum normalization 

45 coefficients for each predetermined time period as fea- 
ture data stored in the feature data memory 21. to 
obtain absolutes of the differences between the corre- 
sponding two averages and then calculate the sum of 
the absolutes. This calculation is performed for all the 

so feature data stored in the feature data memory 21 , and 
feature data which provides the smallest sum (differ- 
ence) is designated as the most approximate feature 
data. 

[0073] In the case of extracting rhythm information of 
55 an object as feature data, the content retrieval section 2 
scans compressed contents stored in the disk 11 of the 
content storage section 1 sequentially to read a fre- 
quency component value after spectrum normalization 
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(change in amplitude) of each of the contents for each 
frame and store the frequency component value in the 
feature data memory 21 as feature data of the rhythm 
information. In this way, rhythm information is extracted 
for all multimedia contents stored in the disk 11, and 
sequentially stored in the feature data memory 21. 
[0074] When the client attempts to retrieve a content 
containing an object having desired rhythm, the client 
inputs rhythm information to the client terminal 3. The 
client terminal 3 transmits the rhythm information to the 
data conversion portion 23 of the content retrieval sec- 
tion 2. The rhythm information may be data indicating a 
hummed rhythm or data in the same format as that of 
the feature data stored in the feature data memory 21. If 
the data transmitted from the client terminal 3 is data 
indicating a rhythm, the data conversion portion 23 con- 
verts the data indicating a rhythm into feature data, and 
transfers the converted feature data to the feature 
extraction/retrieval engine 22. If the data transmitted 
from the client terminal 3 is in the same format as that of 
the feature data stored in the feature data memory 21, 
the data conversion portion 23 transfers the feature data 
to the feature extraction/retrieval engine 22. The feature 
extraction/retrieval engine 22 searches the feature data 
memory 21 to select an object having feature data most 
approximate to the feature data indicating the rhythm 
transmitted from the client terminal 3 and determines a 
content containing the object. The feature extrac- 
tion/retrieval engine 22 instructs the file server 12 of the 
content storage section 1 to retrieve the content. The file 
server 12 reads the content from the disk 11, and sup- 
plies the content to the client terminal 3 via the content 
retrieval section 2. 

[0075] The method for obtaining the most approximate 
feature data is as follows. Frequency component values 
after spectrum normalization (changes in amplitude) for 
respective frames as feature data transmitted from the 
client terminal 3 are compared with frequency compo- 
nent values after spectrum normalization (changes in 
amplitude) for respective frames as feature data stored 
in the feature data memory 21, to obtain absolutes of 
the differences between the corresponding two values 
and then calculate the sum of the absolutes. This calcu- 
lation is performed for all the feature data stored in the 
feature data memory 21, and feature data which pro- 
vides the smallest sum (difference) is designated as the 
most approximate feature data. 

[0076] In the case of extracting melody information of 
an object as feature data, the content retrieval section 2 
scans compressed contents stored in the disk 11 of the 
content storage section 1 sequentially to read spectrum 
normalization coefficients of the contents for each 
frame, thereby to obtain temporal changes of the spec- 
trum normalization coefficients for respective frames 
and store the temporal changes in the feature data 
memory 21 as feature data of the melody information. In 
this way, melody information is extracted for all multime- 
dia contents stored in the disk 11. and sequentially 



stored in the feature data memory 21. 
[0077] When the client attempts to retrieve a content 
containing an object having a desired melody, the client 
inputs data indicating the melody to the client terminal 

5 3. The client terminal 3 transmits the data indicating the 
melody to the data conversion portion 23 of the content 
retrieval section 2. The data indicating a melody may be 
data indicating a hummed melody or data in the same 
format as that of the feature data stored in the feature 

10 data memory 21 . If the data transmitted from the client 
terminal 3 is data indicating a melody, the data conver- 
sion portion 23 converts the data indicating a melody 
into feature data, and transfers the converted feature 
data to the feature extraction/retrieval engine 22. If the 

15 data transmitted from the client terminal 3 is in the same 
format as that of the feature data stored in the feature 
data memory 21, the data conversion portion 23 trans- 
fers the feature data to the feature extraction/retrieval 
engine 22. The feature extraction/retrieval engine 22 

20 searches the feature data memory 21 to select an 
object having feature data most approximate to the fea- 
ture data indicating the melody transmitted from the cli- 
ent terminal 3 and determine a content containing the 
object. The feature extraction/retrieval engine 22 

25 instructs the file server 1 2 of the content storage section 
1 to retrieve the determined content. The file server 12 
reads the content from the disk 11, and supplies the 
content to the client terminal 3 via the content retrieval 
section 2. 

30 [0078] The method for obtaining the most approximate 
feature data is as follows. The temporal changes of the 
spectrum normalization coefficients for respective 
frames as feature data transmitted from the client termi- 
nal 3 are compared with the temporal changes of the 

35 spectrum normalization coefficients for respective 
frames as feature data stored in the feature data mem- 
ory 21. to obtain absolutes of the differences between 
the corresponding two spectrum normalization coeffi- 
cients and then calculate the sum of the absolutes for 

40 each feature data. This calculation is performed for all 
the feature data stored in the feature data memory 21, 
and feature data which provides the smallest sum (dif- 
ference) is designated as the most approximate feature 
data. 

45 [0079] Figure 12 illustrates a data structure of a com- 
pressed multimedia content in this example. In the 
MPEG4 coding method, multimedia content is com- 
posed of objects, and the respective objects are 
recorded after compression. Each of the compressed 

so object data has an object description affixed thereto 
where the summary of the object is described as a text. 
[0080] A method for extracting feature data from the 
compressed content with the above construction will be 
described. 

55 [0081] Herein, the case of extracting word information 
found in the object description as feature data will be 
described. The content retrieval section 2 sequentially 
scans compressed contents stored in the disk 11 of the 
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content storage section 1, reading the object description 
of each object. More specifically, the frequency of 
appearance of a word used in the object description, as 
well as the frequency of appearance of a combination of 
a word with a preceding or following word, are deter- 
mined, and these frequencies of appearance are stored 
in the feature data memory 21 as feature data of the 
word information. In this way feature data of the word 
information is extracted from the object descriptions for 
all multimedia contents stored in the disk 11. and stored 
sequentially in the feature data memory 21. 
[0082] When the client attempts to retrieve a multime- 
dia content having an object description including a 
desired word or combination of words, the client inputs 
the word or combination of words to the client terminal 
3. The client terminal 3 transmits the word or combina- 
tion of words to the data conversion portion 23 of the 
content retrieval section 2. The content retrieval section 
2 sequentially compares the word or combination of 
words with the feature data of word information stored in. 
the feature data memory 21. to select feature data of 
word information having the highest frequency of 
appearance for the word or combination of words 
desired by the client and determine an object having the 
selected feature data and thus a content including the 
object. The content retrieval section 2 instructs the file 
server 12 of the content storage section 1 to transmit 
the determined content. The file server 12 reads the 
content from the disk 1 1 . and supplies the content to the 
client terminal 3 via the content retrieval section 2. 
[0083] The above-described processing can be imple- 
mented in the form of a computer program. 
[0084] Thus, in this example, feature data is created in 
advance as retrieval data by directly extracting from 
each of the compressed contents. This allows for easy 
retrieval and extraction of a content having a desired 
feature. 

[0085] For example, retrieval of a content based on 
the shape is possible, realizing visual retrieval and 
extraction of a desired content. Also, retrieval of a con- 
tent based on the color and brightness is possible, real- 
izing retrieval and extraction of a multimedia content of 
which expression by words is difficult. Further, retrieval 
of a content by the motion is possible, realizing retrieval 
and extraction of a motion picture content, in addition to 
still picture information. Retrieval of a content by the tex- 
ture is also possible, realizing retrieval and extraction of 
a multimedia content having a complicated pattern. For 
example, retrieval of a content by the tone is possible, 
realizing retrieval and extraction of a music or voice con- 
tent based on sound. Retrieval of a content by the 
rhythm is possible, realizing sensuous retrieval and 
extraction of a music or voice content. Further, retrieval 
of a content by the melody is possible, realizing direct 
retrieval and extraction of a music or voice content. Fur- 
thermore, retrieval of a content by a word used in the 
description of the content is possible, realizing retrieval 
and extraction of a multimedia content based on a 



description term. 
(Example 2) 

5 [0086] Figure 13 is a view illustrating the configuration 
of a multimedia content retrieval device of Example 2 
according to the present invention. Referring to Figure 
13, a multimedia content retrieval device 50 includes a 
content storage section 51, a content retrieval section 

w 52. a client terminal 53. and communication lines 91 
and 92. The content storage section 51 stores a plurality 
of compressed contents representing images, sounds, 
and the like. The content retrieval section 52 accesses 
the content storage section 51 for retrieving a content. 

75 The client terminal 53 requests the content retrieval 
section 52 to retrieve a content Normally, a plurality of 
content storage sections 51 and a plurality of client ter- 
minals 53 are connected via communication lines so 
that each of the client terminals 53 can access any of 

20 the content storage sections 51 via the content retrieval 
section 52. 

[0087] The content storage section 51 includes a file 
server 62 and a disk drive 63. The disk drive 63 records 
and reproduces a plurality of compressed contents on 

25 and from a disk 61. The file server 62 controls the disk 
drive 63 to control the recording and reproduction of 
contents on and from the disk 61, and performs data 
communication with external terminals via the commu- 
nication line 91 . 

30 [0088] The content retrieval section 52 is connected to 
the content storage section 51 via the communication 
line 91. The content retrieval section 52 extracts a fea- 
ture of an object included in a content for all contents 
stored in the disk 61. and stores extracted low-level fea- 

35 ture data, such as the shape, color, brightness, and 
motion, in a feature data memory 71 . 
[0089] The client terminal 53 includes a computer, a 
keyboard, a memory, a display, and the like. Upon 
receipt of a feature description text describing a feature 

40 of a desired content by the client's operation of the key- 
board and the like, the client terminal 53 transmits the 
feature description text to the content retrieval section 
52 via the communication line 92. 
[0090] The content retrieval section 52 extracts a key- 

45 word from the received feature description text and con- 
verts the keyword into low-level feature data. The 
resultant low-level feature data is compared with feature 
data stored in the feature data memory 71 sequentially, 
to select feature data most approximate to the con- 

50 verted feature data among the feature data stored in the 
feature data memory 71, and determine an object hav- 
ing the selected feature data and thus a content includ- 
ing the object. The content retrieval section 52 retrieves 
the determined content from the content storage sec- 

55 tion 51 and sends the retrieved content or an address at 
which the content is recorded on the disk 61 to the client 
terminal 53, so as to achieve the retrieval of the content 
desired by the client. 
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[0091] Figure 15 illustrates the construction of the 
content retrieval section 52 in more detail. The content 
retrieval section 52 includes: a feature extrac- 
tion/retrieval engine 72 connected to the communication 
line 91; the feature data memory 71 connected to the 
feature extraction/retrieval engine 72; a keyword extrac- 
tor/translator 74 connected to the communication line 
92 and the feature extraction/retrieval engine 72; and a 
keyword dictionary 73 connected to the keyword extrac- 
tor/translator 74. 

[0092] When a feature description text is supplied to 
the content retrieval section 52 from the client terminal 
53, the keyword extractor/translator 74 extracts a key- 
word from the feature description text. As a keyword, a 
word or a combination of words which is a noun, verb, 
adjective, adverb, or the like is extracted from the text. 
For example, when an expression "a scene where a 
person is running in the evening sun" is input as a fea- 
ture description text, words and combinations of words 
found in the text, such as "person", "running", and 
"evening sun" are extracted from the feature description 
text. The extracted words or combination of words are 
compared with keywords registered in the keyword dic- 
tionary 73 as shown in Figure 16 to search for a key- 
word matching with each of the above words and 
combination of words. Assuming that keywords "per- 
son", "run", "evening sun", and the like are registered in 
the keyword dictionary 73 as shown in Figure 16, the 
keywords matching with the respective words and com- 
bination of words are retrieved. 

[0093] The keyword extractor/translator 74 converts 
each of the retrieved keywords into feature data using 
the keyword dictionary 73. For example, the keyword 
"evening sun" is converted into five feature data of 
[shape: round, color: red, brightness: 192, motion: (0, - 
1), texture: even]. The keyword "person" is converted 
into four feature data of [shape: human-like, color: skin 
color, brightness: 128, texture: skin-like]. The keyword 
"run" is converted into one feature data of [motion: (±10. 
0)]. These feature data are sent to the feature extrac- 
tion/retrieval engine 72. 

[0094] The feature extraction/retrieval engine 72 com- 
pares the feature data of each keyword supplied from 
the keyword extractor/translator 74 with feature data of 
respective objects stored in the feature data memory 71 
as shown in Figure 14, selects an object having feature 
data most approximate to the supplied feature data, 
determines a content including the object, and instructs 
the file server 62 of the content storage section 51 to 
retrieve the content. The file server 62 reads the content 
from the disk 61. and supplies the content to the client 
terminal 53 via the content retrieval section 52. Alterna- 
tively, the feature extraction/retrieval engine 72 may 
supply an address of the content on the disk 61 of the 
content storage section 51 to the client terminal 53. 
[0095] The feature data for the contents stored in the 
feature data memory 71 may be created manually dur- 
ing production of the contents, or may be automatically 



extracted from the contents and stored. 
[0096] In Example 2. as in Example 1, since the data 
amount of multimedia contents representing images, 
sounds, and the like is enormously large, the contents 

5 are normally compressed by a compression coding 
method such as MPEG before being recorded on the 
disk 61 of the content storage section 51. 
[0097] Therefore, as in the feature extraction/retrieval 
engine 22 in Example 1, it is possible for the feature 

10 extraction/retrieval engine 72 to scan compressed con- 
tents stored in the disk 61 of the content storage section 
51 sequentially, to extract the shape, color and bright- 
ness, motion, texture, and the like of each object as fea- 
ture data of the object and store the feature data in the 

15 feature data memory 71 . In this way, feature data can be 
extracted from all the multimedia contents stored in the 
disk 61 and stored sequentially in the feature data mem- 
ory 71. so that data storage as shown in Figure 14 is 
established in the feature data memory 71. In Figures 

20 14 and 16. shape 1 of the object item represents a 
shape of feature data extracted from a macro block as 
shown in Figure 4. and shape 2 represents a shape of 
feature data extracted from a wire-frame model as 
shown in Figure 5. 

25 [0098] In the keyword dictionary 73 as shown in Fig- 
ure 16, keywords and feature data are registered in 
advance for all words and combinations of words antici- 
pated to appear in feature description texts for retrieval 
of multimedia contents. For example, for the combina- 

30 tion of words "evening sun" anticipated to appear in fea- 
ture description texts, the keyword "evening sun", 
feature data representing the shape, feature data repre- 
senting the color and brightness, feature data repre- 
senting the motion, and feature data representing the 

35 texture are registered in the keyword dictionary 73. All of 
these feature data are registered in the same format as 
that of the feature data stored in the feature data mem- 
ory 71. 

[0099] The feature extraction/retrieval engine 72, as in 
40 the feature extraction/retrieval engine 22 in Example 1. 
compares feature data of a keyword supplied from the 
keyword extractor/translator 74 with feature data of 
objects stored in the feature data memory 71. and 
selects feature data most approximate to the supplied 
45 feature data. In Example 2, as in Example 1. feature 
data most approximate to the feature data of the sup- 
plied keyword is selected among all the feature data 
stored in the feature data memory 71 in respective man- 
ners determined in accordance with the shape, color 
so and brightness, motion, texture, and the like of each 
object. 

[0100] In the case where a plurality of keywords are 
extracted from a feature description text input via the cli- 
ent terminal 53. respective feature data most approxi- 
55 mate to the feature data of the keywords are retrieved 
for each content, and differences of these feature data 
are summed to obtain an overall difference between the 
content and the feature description text. By examining 
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the overall differences of all contents, a content most 
approximate to the feature description desired by the cli- 
ent can be retrieved. 

[0101] In the MPEG4 coding, as described above, a 
multimedia content is composed of objects, and the 5 
respective objects are recorded after compression. 
Each of the compressed object data includes an object 
description where the summary of the object is 
described as a text. 

[0102] In the above MPEG4 coding, the following pro- w 
cedure is possible. That is, the content retrieval section 
52 sequentially scans compressed contents stored in 
the disk 61 of the content storage section 51, reading 
the object description of each object. More specifically, 
the frequency of appearance of a word used in the 15 
object description, as well as the frequency of appear- 
ance of a word with a preceding or following word, are 
stored in the feature data memory 71 as feature data of 
word information. In this way, feature data of word infor- 
mation is extracted from the object descriptions for all 20 
multimedia contents stored in the disk 61, and stored 
sequentially in the feature data memory 71, 
[0103] When the client attempts to retrieve a desired 
content based on a word or combination of words, the 
content retrieval section 52 extracts a keyword (word or 25 
combination of words) from an input feature description 
text, and compares the extracted keyword with the word 
information stored in the feature data memory 71 
sequentially without consulting the keyword dictionary 
73, to select feature data of the word information having 30 
the highest frequency of appearance for the extracted 
keyword and determine an object having the selected 
feature data and thus a content including the object. 
The content retrieval section 52 then instructs the file 
server 62 of the content storage section 51 to retrieve 35 
the determined content. The file server 62 reads the 
content from the disk 61 , and supplies the content to the 
client terminal 53 via the content retrieval section 52. 
[0104] When a plurality of keywords are extracted 
from a feature description text supplied from the client 40 
terminal 53, the sum of the frequencies of appearance 
of the respective keywords are calculated for each con- 
tent, to select a content having the largest sum of fre- 
quencies. This enables retrieval of a content most 
approximate to the desired content supplied from the 45 
client terminal 53. 

[0105] The above-described processing can be imple- 
mented in the form of a computer program. 
[0106] Thus, in this example, a content having a fea- 
ture desired by the client can be easily retrieved and so 
extracted by using low-level feature data directly 
extracted from a compressed content even if a high- 
level feature description text is input by the client as 
retrieval data for the content. 

[0107] For example, a keyword extracted from a fea- 55 
ture description text can be easily converted into feature 
data complying with the format of feature data stored in 
the feature data memory, realizing more correct retrieval 



and extraction of a desired multimedia content. 
[0108] A keyword extracted from a feature description 
text can be easily converted into feature data complying 
with the format of feature data stored in the feature data 
memory, realizing retrieval and extraction of a desired 
multimedia content with a higher probability of success. 
[0109] In the case of extracting shape information 
from a feature description text, retrieval and extraction 
of a desired multimedia content is realized with a higher 
probability of success. 

[0110] In the case of extracting color and brightness 
information from a feature description text, retrieval of a 
content based on the color and brightness is possible, 
realizing more correct retrieval and extraction of a 
desired multimedia content. 

[0111] In the case of extracting motion information 
from a feature description text, retrieval of a content 
based on the motion is possible, realizing more correct 
retrieval and extraction of a desired multimedia content. 
[0112] In the case of extracting a keyword relating to 
a texture from a feature description text, retrieval of a 
content based on the texture information is possible, 
realizing more correct retrieval and extraction of a multi- 
media content having a complicated pattern. 
[0113] Retrieval of a content based on a word used in 
the description of the content is possible, realizing 
retrieval and extraction of a multimedia content based 
on a description term. 

[0114] Various other modifications will be apparent to 
and can be readily made by those skilled in the art with- 
out departing from the scope and spirit of this invention. 
Accordingly, it is not intended that the scope of the 
claims appended hereto be limited to the description as 
set forth herein, but rather that the claims be broadly 
construed. 

Claims 

1. A multimedia data retrieval device comprising: 

a content storage section for storing a plurality 
of compressed contents; 
a client terminal for inputting feature data; 
a feature data storage section for reading fea- 
ture data extracted from at least one of the 
compressed contents from the content storage 
section and storing the feature data of the at 
least one compressed contents: and 
a content retrieval section for selecting feature 
data approximate to the feature data input via 
the client terminal among the feature data 
stored in the feature data storage section, and 
retrieving a content having the seated feature 
data from the content storage s- * n 

2. A multimedia data retrieval devi. • .rding to 
claim 1. wherein each of the comp- : contents 
includes a plurality of macro blocks ■ — rnting an 
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image shape, the image shape represented by the 
macro blocks is converted into a value consisting of 
at least one bit, and the bit is used as feature data 
of a shape represented by the content. 

3. A multimedia data retrieval device according to 
claim 1. wherein each of the compressed contents 
includes mesh-coded data representing an image 
shape, and the mesh-coded data is used as feature 
data of a shape represented by the content. 

4. A multimedia data retrieval device according to 
claim 1, wherein each of the compressed contents 
includes a plurality of macro blocks representing an 
image shape, an average of DC components of a 
luminance component (Y) and a DC component of 
each of chrominance components (Pb, Pr) are 
obtained for each macro block, and the average 
and the DC components are used as feature data of 
color information and brightness information repre- 
sented by the content. 

5. A multimedia data retrieval device according to 
claim 1. wherein each of the compressed contents 
includes a plurality of macro blocks representing an 
image shape, motions of an object represented by 
macro block motion information are read to obtain 
an average of the motions of the object, and the 
average is used as feature data of motion informa- 
tion of the object represented by the content. 

6. A multimedia data retrieval device according to 
claim 1 wherein each of the compressed contents 
includes a plurality of macro blocks representing an 
image shape, DC components and AC components 
of a luminance component and DC components 
and AC components of a chrominance component 
of an object represented by the macro blocks are 
read, and averages of the respective components 
are obtained and used as feature data of texture 
information of the object represented by the con- 
tent. 



mation. 

9. A multimedia data retrieval device according to 
claim 1 , wherein each of the compressed contents 
5 includes frames representing sound, a prediction 

residual recorded for each frame is read, and the 
prediction residual is used as feature data of rhythm 
information. 

w 10. A multimedia data retrieval device according to 
claim 1 . wherein each of the compressed contents 
includes frames representing sound, a frequency 
component after spectrum normalization performed 
for each frame is read, and the frequency compo- 

15 nent is used as feature data of rhythm information. 

11. A multimedia data retrieval device according to 
claim 1 , wherein each of the compressed contents 
includes frames representing sound, LPC coeffi- 

20 cients recorded for each frame are read, and a tem- 
poral change of the LPC coefficients is used as 
feature data of melody information. 

12. A multimedia data retrieval device according to 
25 claim 1 . wherein each of the compressed contents 

includes frames representing sound, spectrum nor- 
malization coefficients recorded for each frame are 
read, and a temporal change of the spectrum nor- 
malization coefficients is used as feature data of 
30 melody information. 

13. A multimedia data retrieval device according to 
claim 1 , wherein each of the compressed contents 
includes a plurality of objects, an object description 

35 recorded for each object is read, and a frequency of 

appearance of a word, as well as a frequency of 
appearance of a combination of a word and a pre- 
ceding or following word, used in the object descrip- 
tion are used as feature data of word information. 

40 

14. A multimedia data retrieval method comprising the 
steps of: 



7. A multimedia data retrieval device according to 
claim 1, wherein each of the compressed contents 
includes frames representing sound, LPC coeffi- 
cients recorded for each frame are read, and an 
average of the LPC coefficients is obtained and 
used as feature data of tone information repre- 
sented by the multimedia content. 

8. A multimedia data retrieval device according to 
claim 1, wherein each of the compressed contents 
includes frames representing sound, spectrum nor- 
malization coefficients recorded for each frame are 
read, and an average of the spectrum normalization 
coefficients is obtained for each predetermined 
time period and used as feature data of tone infor- 



storing a plurality of compressed contents; 

45 inputting feature data via a client terminal; 

reading feature data extracted from the com- 
pressed contents and storing the feature data 
of the compressed contents; and 
selecting feature data approximate to the fea- 

50 ture data input via the client terminal among 

the stored feature data, and retrieving a con- 
tent having the selected feature dat t 'rom the 
stored contents. 

55 15. A multimedia data retrieval device com: ■ * i 

a content storage section for stor • rality 
of contents; 
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a client terminal for inputting a feature descrip- 
tion text; 

a feature data storage section for reading fea- 
ture data of the contents from the content stor- 
age section and storing the feature data of the 5 
contents; and 

a content retrieval section for extracting a key- 
word from the feature description text input via 
the client terminal, converting the keyword into 
feature data, selecting feature data approxi- w 
mate to the feature data of the keyword among 
the feature data stored in the feature data stor- 
age section, and retrieving a content having the 
selected feature data from the content storage 
section. '5 

16. A multimedia content retrieval device according to 
claim 15, wherein the content retrieval section 
includes a keyword dictionary for converting a key- 
word into feature data, and the keyword extracted 20 
from the feature description text is converted into 

the feature data using the keyword dictionary. 

17. A multimedia content retrieval device according to 
claim 15. wherein the content retrieval section 25 
extracts a major part of speech from the feature 
description text to be used as a keyword. 

18. A multimedia content retrieval device according to 
claim 15, wherein the content retrieval section uses 30 
shape information of a content as the feature data. 

19. A multimedia content retrieval device according to 
claim 15, wherein the content retrieval section uses 
color information and brightness information of a 35 
content as the feature data. 

20. A multimedia content retrieval device according to 
claim 15, wherein the content retrieval section uses 
motion information of a content as the feature data. 40 

21. A multimedia content retrieval device according to 
claim 15, wherein the content retrieval section uses 
texture information of a compressed content as the 
feature data. **5 

22. A multimedia data retrieval method comprising the 
steps of: 

storing a plurality of contents; so 
inputting a feature description text via a client 
terminal; 

reading feature data of the contents and storing 
the feature data; and 

extracting a keyword from the feature descrip- ss 
tion text input via the client terminal, converting 
the keyword into feature data, selecting feature 
data approximate to the feature data of the key- 



word among the stored feature data, and 
retrieving a content having the selected feature 
data from the stored contents. 
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FIG. 4 
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FIG. 6 
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FIG. 7 

DC component of chrominance data 
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FIG. 9 



Frequency component of luminance or chrominance 
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FIG. 1 5 
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FIG. 1 7 
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